-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ingestion to write feature metrics for validation #449
Update ingestion to write feature metrics for validation #449
Conversation
- Ignore StatsD execption since it is not gonna be thrown when using non blocking StatsD client: https://github.com/DataDog/java-dogstatsd-client#unix-domain-socket-support - Reuse tags variables - Add validation for correct feature set reference in FeatureRow
And make the checking of exporter type in ingestion more robust.
Previosly missing the word 'ingestion'
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidheryanto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ingestion/src/main/java/feast/ingestion/transform/metrics/WriteMetricsTransform.java
Show resolved
Hide resolved
/hold |
Can this (and any other metrics in ingestion) be migrated to standard Beam custom metrics? https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/metrics/Metrics.html These will automatically end up in Stackdriver (when using Dataflow) and you can use a Stackdriver exporter for Prometheus: https://github.com/helm/charts/tree/master/stable/stackdriver-exporter |
@davidheryanto In addition to the test, can we please also add documentation to describe to users how they can benefit from this? Specifically
|
Hi @Yanson, thanks for you input. I will let @davidheryanto and @zhilingc respond to you on this point in more detail, but we previously did use Beam custom metrics but I believe we moved away due to its metrics being scoped to a global window and the limited types of instrumentation compared to statsd/prom. |
Thanks for the suggestion! I think its a viable option. Beam metrics behave very similarly to prometheus exporters, and as long as its exported out to prom (which can compute time windowed metrics using the delta) it should work just fine. |
- Update Field, Feature and Entity class with fields from presence_constraints, shape_type and domain_info
- Create 2 distinct services, one for metrics, one for statsd because they use different protocols (TCP and UDP) respectively and exposing this via Kube LoadBalancer won't work by default because most LoadBalancer only supports either TCP or UDP - Allow users to define statsd_mapping from helm values
27f43fd
to
2887e8c
Compare
- Include more filters for constraint validation (so no many to many results for subtrations) - Use datasource: null so dashboard will use "default" datasource - Unset uid so it will be autogenerated during import
2887e8c
to
0b72ddd
Compare
f179240
to
5593040
Compare
acbe497
to
c4d66e7
Compare
c4d66e7
to
1a6ebc5
Compare
/hold cancel |
/retest |
@davidheryanto waiting for the tests to pass before I evaluate this PR |
"%%bash\n", | ||
"# Sample data from BigQuery public dataset: bikeshare stations\n", | ||
"# https://cloud.google.com/bigquery/public-data\n", | ||
"wget https://raw.githubusercontent.com/davidheryanto/feast/update-ingestion-metrics-for-validation/examples/data_validation/bikeshare_stations.csv\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
davidheryanto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because this PR is not merged yet. I will update it once merged
dashboards: | ||
feast: | ||
feast-ingestion-dashboard: | ||
url: https://raw.githubusercontent.com/davidheryanto/feast/update-ingestion-metrics-for-validation/ingestion/src/main/resources/grafana-dashboard.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
davidheryanto again
sleep 20 | ||
tail -n10 /var/log/kafka.log | ||
kafkacat -b localhost:9092 -L | ||
|
||
echo " | ||
============================================================ | ||
Installing Telegraf with StatsD input and Prometheus output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be changed to statsd_exporter as below.
- name: prometheus-statsd-exporter | ||
version: 0.1.2 | ||
condition: prometheus-statsd-exporter.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved out of core.
|
||
@presence.setter | ||
def presence(self, presence: schema_pb2.FeaturePresence): | ||
if not isinstance(presence, schema_pb2.FeaturePresence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add comments for the setters?
self.value_count = feature.value_count | ||
|
||
def update_domain_info( | ||
self, feature: schema_pb2.Feature, schema: schema_pb2.Schema = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More missing comments
@@ -167,3 +170,94 @@ def test_add_features_from_df_success( | |||
) | |||
assert len(my_feature_set.features) == feature_count | |||
assert len(my_feature_set.entities) == entity_count | |||
|
|||
def test_import_tfx_schema(self): | |||
tests_folder = pathlib.Path(__file__).parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this depend on where pytest is launched from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh pathlib.Path(__file__)
will give the correct relative path to test_feature_set.py
. If pytest is called from different different context, the relative path will change to the correct location too.
@@ -11,21 +11,24 @@ | |||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also add the other missing tests like for your update methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will add it
/hold |
This PR is still missing |
Closing this because the master has been updated to include writing metrics for feature values, which override with the changes here. |
I believe the last comment refers to #486, for reference. |
What this PR does / why we need it:
This PR updates Feast Ingestion to write feature level metrics so feature values and presence can be validated. The schema for feature validation follows that from Tensorflow metadata https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto. The
FeatureSpec
andEntitySpec
inFeatureSet.proto
has been updated to partially support the validation schema in #438.This PR only addresses the Milestone 1 requirements in https://docs.google.com/document/d/1TPmd7r4mniL9Y-V_glZaWNo5LMXLshEAUpYsohojZ-8
In order to fully make use of these metrics, it is assumed that:
A sample
grafana-dashboard.json
is also provided for a generic Grafana dasboard that users can import to use the Prometheus metrics. The alerts can then be configured accordingly. Sample dashboard looks like the following:Which issue(s) this PR fixes:
Partially addresses #172
Does this PR introduce a user-facing change?: